Data Analytics Tempo: the ability to consistently develop novel, operationally relevant data workflows through well-planned, tractable projects.
McKinsey research estimates 70 percent of complex, large-scale digital transformations fail to meet their goals(Michael Bucy, Adrian Finlayson, and Chris Moye n.d.), in part because large companies fail to recognize the breadth of commitment needed to transform business practices in an impactful manner. Thomas Davenport and George Westerman of the Harvard Business Review assert:
“Digital (transformation) is not just a thing that you can you can buy and plug into the organization. It is multi-faceted and diffuse, and doesn’t just involve technology. Digital transformation is an ongoing process of changing the way you do business. It requires foundational investments in skills, projects, infrastructure, and, often, in cleaning up IT systems. It requires mixing people, machines, and business processes, with all of the messiness that entails. It also requires continuous monitoring and intervention, from the top, to ensure that both digital leaders and non-digital leaders are making good decisions about their transformation efforts.”
“Why So Many High-Profile Digital Transformations Fail” HBR March, 2018
Although our organization’s transformation consists of efforts with respect to infrastructure and cleaning up IT systems, this paper focuses on how we have invested in skills and projects to better incorporate data analytics into critical business practices. In short, this paper describes fundamental constructs we have used to incorporate analytics into critical business functions.
We define data analytics tempo as the ability to consistently develop novel, operationally relevant data workflows through well-planned, tractable projects, and our data analytics strategy prioritizes tempo in as the primary means to grow our organization’s ability to employ analytics toward impactful battlefield effects. We have stopped talking about “big data” and started growing our digital literacy by “doing data science” resulting in individual, team, and organizational growth. Moreover, these efforts inform our priorities with respect to infrastructure and IT system updates.
The purpose of this paper is to share how we have incorporated doctrine from both industry and academia to inform project identification and organizational design, develop business processes, and conduct planning. Our goal is to share this nascent doctrine with mission partners in order to share lessons learned and garner critique and recommendations from fellow practitioners. We start by describing how to identify and prioritize projects, then discuss how we task organize for tempo. Finally, we share frameworks for planning and executing projects. We conclude by introducing next steps as we move toward increasing the scale, rigor, and impact of our analytics program.
Transforming data science into capability is similar to many of the highly technical disciplines currently employed on today’s battlefield. For illustrative purposes we compare data science and close air support. Both functions require specific talent, equipment and training to achieve a desired battlefield effects, and synchronizing these effects requires practice. We cannot gain mastery without “doing” data science as an organization. We use common frameworks from the research community, industry, and defense to provide operational leaders a minimal set of principles with which to begin growing through ruthless experimentation. Our goal has not been to conduct machine learning or artificial intelligence, our goal is to deliver impact through insight; we prioritize tractable projects that deliver insight to end users within 15-60 days. However, identifying and framing impactful analytics problems is not trivial. The majority of our operational leaders do not understand exactly how data analytics can accelerate decisions and illuminate insight. They are not able to “request data support” in a manner similar to requesting close-air support, which leads to our decision to initially prioritize decentralized, multidisciplinary, deployed data teams which will be described in the next section.
We call these multidisciplinary teams, Tactical Data Exploitation Teams (TDET) and employ them based on three operating principles.
Tactical Data Exploitation Team (TDET) Operating Principles: - Data exploitation is multidisciplinary and requires operations, intelligence, and data professionals. - Mastery requires both repetition and progression. - Talent generates tempo, and tempo attracts talent.
Attracting, growing, and retaining our data analytics talent is largely a function of our ability to employ them against relevant and challenging problem sets. However, data scientists are an essential but insufficient component of a TDET. The greatest challenge with respect to project identification is that the expertise needed to assess project feasibility and utility lie in different sets of domain expertise. Data scientists’ understand how to apply algorithms to extract meaningful patterns from data, but they are not qualified to interpret model output or identify the most operationally relevant insights. As stated by Davenport and Westerman, we cannot just buy data scientists and plug them into the organization. We must mix people, machines, and business processes. Therefore, we task organize with the goal of gaining tempo by:
The steps listed above have enabled the SOJTF to construct TDETs at multiple forward deployed locations with highly impactful results. We also use a similar CONUS construct for reach back data analytics support; however, it is important to note that we have generated tempo initially in a decentralized manner (i.e at the edge) and are now centralizing our data science culture and scaling our capability through CONUS force structure. This hybrid model of decentralized application with centralized culture has become increasingly popular in industry, and we recommend others also initially prioritize efforts as close to “real analysts with real questions and real deadlines” as possible.
Ultimately, we need to assess value to prioritize projects. The Heilmeier Catechism (Defense Advanced Research Agency (DARPA) n.d.) was developed at DARPA to propose/select research projects and has been broadly used in industry and academia (Bill Scherlis n.d.). This framework consists of eight elegant questions that force a rigorous understanding of a given project’s cost/benefit or value before selection:
To effectively answer these questions an analytic development plan (ADP) has four essential elements of analysis: a desired knowledge outcome (Operational Understanding), an understanding of how available data can be leveraged toward that outcome (Data Understanding), a tractable methodological approach, and an execution plan and timeline. Each element of analysis has specific information requirements and draws on various forms of expertise within the organization. We describe each in this section.
An ADP must be developed with a specific objective or a desired knowledge outcome. It is also paramount that this outcome is useful, feasible, and measurable. While the operational SME is the most qualified team member to estimate utility, he/she will depend on the team’s data and analytic expertise to bound the outcome with respect to feasibility. For example the operational SME may want to geospatially predict attack locations for inspired attacks based on social media data. However, the team’s analytic expertise could propose a more feasible objective like identification of bot-driven propaganda campaigns, which could offer insight into areas where ISIS is trying to foment violence.
Gaining data understanding often starts with the operational SME’s intuition. The operational SME thinks there is information within a given data set; however, he/she cannot extract that information due to the scale and/or complexity of the data. It is the data SME’s responsibility to gain enough understanding of structure to estimate how it can be exploited methodologically and how difficult it will be to make this data usable for the project. Again, this phase emphasizes the need for both operational experts and data experts.
Given the desired knowledge outcome and proposed data, the Analytic SME identifies a methodological framework and evaluation schema, and communicates the time required and potential benefit of the project. Selection of analytic methodologies is highly dependent on both the desired knowledge outcome and the data available, thus requiring operational and analytic SME input. For example, if foreign fighter facilitation has specific tradecraft which can be detected through machine learning or anomaly detection within a data set it is critical that the operational SME describe that tradecraft in detail such that is can be extracted with appropriate quantitative methodologies. Moreover, constructing the experiment in a manner that enables the team to evaluate the uncertainty associated with model output requires a rigorous understanding of design of experiments. Evaluation of model performance also often requires subject matter expertise within the application domain. A clearly defined methodological framework and evaluation schema is needed to address questions 4 and 8 of the Heilmeier Catechism: what is new in your approach, why do you think it will be successful, and what are the mid-term and final “exams” to check for success? It is important to specify measures of success apriori. Elements 1-3 of the ADP should adequately answer questions 1-4 of the Heilmier Catechism: what do you want to do, why is current practice not adequate, why do you think the new approach will be successful, and what difference will it make?
The execution plan and timeline lays out a set of objectives and milestones and addresses questions 5-7 of the Heilmeier Catechism: what are the risks; how much will the project cost; how long will it take, and what are the metrics used to assess interim and final performance.
Each ADP is then executed in 4 phases, as depicted in Figure 1. It is important to reiterate, phases 1-3 are often performed iteratively prior to deployment of the analytic, but often times little update is needed with respect to the key intelligence question.
Just as any mission requires deliberate planning, it must be executed with process and norms. Analytic development or “modeling” has been in use by industry, academia, and government for decades, and many frameworks exist. We have used one of the most popular frameworks, the Cross-industry Standard Process for Data Mining (CRISP-DM) (“Cross-Industry Standard Process for Data Mining” 2020) as the baseline for our Analytic Development Process which is depicted in Figure 1. This process can be broken down into 4 phases: operational understanding, data understanding, model development and analysis, and operationalizing insight.
Phase 1, Operational Understanding, starts by identifying an operationally relevant outcome or “key intelligence question” that our analytic will address. Phase 2, Data Understanding, identifies data sources relevant to the desired knowledge outcome and provides steps to understand the potential and limitations of that data as well as the steps needed to make it usable for modeling. Phase 3, Model Development, selects relevant data mining methodologies based on an understanding of the data and the desired outcome and implements iterative model development until a desired level of certainty is met. Phase 4 focuses on how those insights can be delivered to the end user in a manner that provides operational insight.
Peer review is an essential element of the analytic development process as well. We recommend at least two validation points within this process in order to ensure quality assurance. Both project development/planning and result interpretation require peer review and validation. Due to the breadth and technical complexity of many of our projects, we often need expertise not resident in the TDET to validate assumptions and conclusions. The red octagons in Figure-1 depict these two peer review checks within our process. The first check is at the formulation of the analytic development plan and prevents poor time investment due to flawed initial assumptions due to incomplete operational, data, or methodological understanding. Our second check ensures responsible interpretation of model results through peer review and is essential due to the nature of our application domain. However, these checks are likely necessary.
Our analytic development process provides a general framework that allows operational leaders, application domain experts, and data experts to deliberately manage the lifecycle of an analytic development project. The process helps maintain discipline as we pursue tempo, and provides a means to synchronize efforts across multiple teams.
Again, we assert that we cannot effectively exploit our data holdings without routine, rigorous experimentation. In this paper we have proposed a task organization, and a framework to start planning, executing, and evaluating analytic development projects. Our goal is to continue to gain mastery through executing projects of increasing complexity, scale, and impact. Within the SOJTF we have achieved data analytic tempo and reached an inflection point in terms of our digital transformation. We must deliberately resource disciplined innovation into our data analytics culture. We can do this in two ways. First, we must capitalize on our organizational commitment to sharing best practices and TTPs. Data science and software development have highly active open source collaborative online communities. We should be able to develop similar defense/intelligence collaborative communities on all of our classified networks. Continued investment in high side collaboration will drive innovation. Now that we have developed a steady demand signal for analytics support at the SOJTF, we must invest in centralized force structure that provides analytics as a service, oversight, and ensures broad use of best practices developed both inside and outside of the SOF enterprise.
The SOJTF has proven to be an effective battle lab to grow nascent doctrine for operational data analytics. We have shared how we have incorporated doctrine from both industry and academia to inform project identification and task organization, develop business processes, and conduct planning. We described how we think about and prioritize projects and discussed how we task organize for tempo. Finally we shared how we plan projects, our analytic development framework for project execution, and highlighted next steps in our digital transformation. We will continue to pursue knowledge by sharing it, and welcome questions, critique and advice.
Bill Scherlis. n.d. “Evaluation and Validity for SEI Research Projects.” SEI Blog, Software Engineering Institute, Carnegie Mellon University. Accessed September 8, 2020. https://insights.sei.cmu.edu/sei_blog/2013/02/evaluation-and-validity-for-sei-research-projects.html.
Defense Advanced Research Agency (DARPA). n.d. “The Heilmeier Catechism.” Accessed September 8, 2020. https://www.darpa.mil/work-with-us/heilmeier-catechism.
Michael Bucy, Adrian Finlayson, and Chris Moye. n.d. “The ‘How’ of Transformation McKinsey.” McKinsey & Company. Accessed September 8, 2020. https://www.mckinsey.com/industries/retail/our-insights/the-how-of-transformation#.